Real-time Activity Recognition by Discerning Qualitative Relationships Between Randomly Chosen Visual Features
نویسندگان
چکیده
Motivation. Automatic recognition of human activities (or events) from video is important to many potential applications of computer vision. One of the most common approach is the bag-of-visual-features, which aggregate space-time features globally, from the entire video clip containing complete execution of a single activity. The bag-of-visual-features does not encode the spatio-temporal structure in the video. For this reason, there is a growing interest in modeling spatio-temporal structure between visual features in order to improve the results of activity recognition. The proposed framework. We model the spatio-temporal structure by exploiting the qualitative relationships between a pair of visual features. The proposed approach is inspired by [3, 4]. The goal is to find a pair of visual features whose spatiotemporal relationships are discriminative enough, and temporally consistent for distinguishing various activities. The framework is applied to recognize activities from a continuous live video (egocentric view) of a person performing manipulative tasks in an industrial setup. In such environments, the purpose of activity recognition is to assist users by providing on-the-fly instructions from an automatic system that maintains an understanding of the on-going activities. In order to recognize activities in real-time, we propose a random forest with a discriminative Markov decision tree algorithm that considers a random subset of relational features at a time and Markov temporal structure that provides temporally smoothed output (Fig. 1). Our algorithm is different from conventional decision trees [2] and uses a linear SVM as a classifier at each nonterminal node and effectively explores temporal dependency at terminal nodes of the trees. We explicitly model the spatial relationships of left, right, top, bottom, very-near, near, far and very-far as well as temporal relationships of during, before and after between a pair of visual features (Fig. 2), which are selected randomly at the nonterminal nodes of a given Markov decision tree. Our hypothesis is that the proposed relationships are particularly suitable for detecting complex non-periodic manipulative tasks and can easily be applied to the existing visual descriptors such as SIFT, STIP, CUBOID and SURF. Growing discriminative Markov decision trees. Each tree is trained separately on a random subset of frames belonging to training videos. Learning proceeds recursively by splitting the training frames at internal nodes into the respective left and right subsets. This is done in the following four stages: randomly assign all frames from each activity class to a binary label; randomly sample a pair of visual words; compute the spatiotemporal relationships histogram h between them; and use a linear SVM to learn a binary split using the extracted h. The binary SVM at each internal node sends the frame to the left child if wT h≤ 0 otherwise to the right child, where w is the set of weights learned through the linear SVM. Using an information gain criteria, each binary split corresponds to a pair of visual words is evaluated on the training frames that falls in the current node. Finally, the split that maximizes the information gain is selected. The splitting process is repeated with the newly formed subsets until the current node is considered as a leaf node. Inference. For real-time activity recognition, the proposed inference algorithm computes the posterior marginals P(at |lτ 1 . . . l τ t ) of all activities at over a frame It given a history of visited leaf nodes is lτ 1 . . . l τ t (Fig.1b) for a particular tree τ . The smoothed output over the whole forest is achieved by averaging the posterior probabilities from all T trees:
منابع مشابه
Development and Evaluation of Real-Time RT-PCR Test for Quantitative and Qualitative Recognition of Current H9N2 Subtype Avian Influenza Viruses in Iran
Avian influenza H9N2 subtype viruses have had a great impact on Iranian industrial poultry production economy since introduction in the country. To approach Rapid and precise identification of this viruses as control measures in poultry industry, a real time probe base assay was developed to directly detect a specific influenza virus of H9N2 subtype -instead of general detection of Influenza A ...
متن کاملRecognition of Visual Events using Spatio-Temporal Information of the Video Signal
Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...
متن کاملAircraft Visual Identification by Neural Networks
In the present paper, an efficient method for three dimensional aircraft pattern recognition is introduced. In this method, a set of simple area based features extracted from silhouette of aerial vehicles are used to recognize an aircraft type from its optical or infrared images taken by a CCD camera or a FLIR sensor. These images can be taken from any direction and distance relative to the fly...
متن کاملVehicle Logo Recognition Using Image Matching and Textural Features
In recent years, automatic recognition of vehicle logos has become one of the important issues in modern cities. This is due to the unlimited increase of cars and transportation systems that make it impossible to be fully managed and monitored by human. In this research, an automatic real-time logo recognition system for moving cars is introduced based on histogram manipulation. In the proposed...
متن کاملReal-time Action Recognition by Spatiotemporal Semantic and Structural Forests
This paper presents a novel real-time action recogniser by utilising both local appearance and structural information. Our method is able to recognise actions continuously in real-time while achieving comparably high accuracy over state-of-the-arts. Run-time speed is of vital importance in real-world action recognition systems, but existing methods seldom take computational complexity into full...
متن کامل